According to the definition, we have for any matrix $A\in \mathbb{R}^{m \times n}$:
$$ \begin{align} tr(A^TA)&=\sum_{i=1}^m (A^TA)_{ii}\\ &=\sum_{i=1}^m \sum_{j=1}^n A^T_{ij}A_{ji}\\ &=\sum_{i=1}^m \sum_{j=1}^n A_{ji}A_{ji}\\ &=\sum_{i=1}^m \sum_{j=1}^n A_{ji}^2,\\ \end{align} $$which is the definition of F-norm squared, namely the sum of all entries squared.
Therefore, we have: $$ \begin{align} ||Y-XB||^2_F-\lambda ||B||^2_F&=tr((Y-XB)^T(Y-XB))-\lambda(tr(B^TB))\\ &=tr((Y^T-B^TX^T)(Y-XB))-\lambda(tr(B^TB))\\ &=tr((Y^TY-Y^TXB-B^TX^TY+B^TX^TXB))-\lambda(tr(B^TB))\\ &=tr(Y^TY)-tr(Y^TXB)-tr(B^TX^TY)+tr(B^TX^TXB)-\lambda(tr(B^TB))\\ &=tr(Y^TY)-tr(Y^TXB)-tr((Y^TXB)^T)+tr(B^TX^TXB)-\lambda(tr(B^TB))\\ &=tr(Y^TY)-2tr(Y^TXB)+tr(B^TX^TXB)-\lambda(tr(B^TB)),\\ \end{align} $$ as desired.
as desired.
And then we can easily get: $$ \begin{align} \frac{\partial^2}{\partial B^2} (||Y_XB||^2_F+\lambda ||B||^2_F)&=\frac{\partial}{\partial B}[2(X^TX+\lambda I)B-2X^TY]\\ &=2\frac{\partial}{\partial B}[(X^TX+\lambda I)BI-2X^TY]\\ &=2(X^TX+\lambda I)^T-0\\ &=2[(X^TX)^T+(\lambda I)^T]\\ &=2(X^TX+\lambda I), \end{align} $$ as desired.
According to $\textbf{Proposition 7}$ from the \textcolor{magenta}{Ref}, to show $||Y-XB||^2_F$ is convex, it suffices to show $\frac{\partial^2}{\partial B^2} ||Y-XB||^2_F \succcurlyeq 0$:
$$ \begin{align} \frac{\partial^2}{\partial B^2} ||Y-XB||^2_F &=\frac{\partial}{\partial B} -2X^TY+2X^TXB\\ &=\frac{\partial}{\partial B} 2(X^TX)BI\\ &=2(X^TX)^TI^T\\ &=2(X^TX) \end{align} $$$\forall w \in \mathbb{R}^r \neq \vec{0}$: $$ \begin{align} w^T[2(X^TX)]w&=2w^TX^TXw\\ &=2(Xw)^TXw\\ &=2||Xw||^2\\ &\succcurlyeq 0 \end{align} $$
Hence by $\textbf{Corollary }6$, every critical point $x*$, when $\nabla f(x*)=0$, is a global minimizer, hence a local minimizer, where $f:B\mapsto ||Y-XB||^2_F$.
By $\textbf{Corollary }6$, if $\nabla f(B*)=0$, and we've already shown that $f$ is convex, then $B*$ is a global minimizer: $$ \begin{align} \nabla f= \frac{\partial}{\partial B} ||Y-BX||^2_F&=0\\ \Rightarrow -2X^TY+2(X^TXB)&=0\\ X^TXB&=X^TY\\ B&=(X^TX)^{-1}X^TY \end{align} $$
To show $X^TX+\lambda I$ is invertible, it suffices to show that $X^TX+\lambda I$ is positive definite. $\forall w \in \mathbb{R}^r \neq \vec{0}$: $$ \begin{align} w^T(X^TX+\lambda I)w&=(w^TX^TX+w^T\lambda I)w\\ &=w^TX^TXw+w^T\lambda Iw\\ &=(Xw)^TXw+\lambda w^Tw\\ &=||Xw||^2+\lambda||w||^2\\ &\succ 0, \end{align} $$
where (39) comes from $w \neq \vec{0}$ and $\lambda >0$\ \Therefore, $X^TX+\lambda I$ always has an inverse. Similarly as in \textbf{iv.}, we need to find the $ \tilde{B*}$ as a global minimizer s.t. $\nabla \tilde{f}(\tilde{B*})=0$: $$ \begin{align} \nabla \tilde{f}=\frac{\partial}{\partial B} ||Y-BX||^2_F+||B||^2_F&=0\\ \Rightarrow 2(X^TX+\lambda I)B-2X^TY&=0\\ (X^TX+\lambda I)B&=X^TY\\ B&=(X^TX+\lambda I)^{-1}X^TY, \end{align} $$ $Q.E.D$
Suppose $B\in \mathbb{R}^{n \times l}$, $X\in \mathbb{R}^{l \times m}$, $C\in \mathbb{R}^{m\times n}, \forall x_{pq} \in X:$
$$ \begin{align} \frac{\partial}{\partial x_{pq}} tr(BXC)&=\frac{\partial}{\partial x_{pq}} \sum_{i=1}^n (BXC)_{ii}\\ &=\frac{\partial}{\partial x_{pq}} \sum_{i=1}^n \sum_{j=1}^l B_{ij}(XC)_{ji}\\ &=\frac{\partial}{\partial x_{pq}} \sum_{i=1}^n \sum_{j=1}^l \sum_{k=1}^m B_{ij}X_{jk} C_{ki}\\ &=\frac{\partial}{\partial x_{pq}} \sum_{i=1}^n \sum_{j=1}^l \sum_{k=1}^m B_{ij}x_{jk} C_{ki}\\ &= 0+ \frac{\partial}{\partial x_{pq}}\sum_{i=1}^n B_{ip}C_{qi}\\ &=\sum_{i=1}^n B^T_{pi}C^T_{iq}\\ &=(B^TC^T)_{pq} \end{align} $$Therefore: $$ \frac{\partial}{\partial X} tr(BXC)=B^TC^T, $$
as desired.
Suppose $X$ is of the same dimension, so that $X^T \in \mathbb{R}^{m \times 1} $ now let $\tilde{B}\in \mathbb{R}^{n \times m}$, $\tilde{C} \in \mathbb{R}^{l \times n}$ so that $BX^TC \in \mathbb{R}^{n \times n}$. Therefore:
$$ \begin{align} tr(\tilde{B}X^T\tilde{C})&=tr[(\tilde{B}X^T\tilde{C})^T]\\ &=tr(\tilde{C}^TX\tilde{B}^T), \end{align} $$where we can find out $\tilde{C}^T \in \mathbb{R}^{n \times l}$, $\tilde{B}^T \in \mathbb{R}^{m \times n}$, which satisfies the condition in of matrices in $(i)$. Using the conclusion from $(i)$, we can then get:
$$ \frac{\partial}{\partial X} tr(BX^TC)=CB, $$as desired.
Suppose $X\in \mathbb{R}^{m \times n}$, $X^T\in \mathbb{R}^{n \times m}$, meaning $B\in \mathbb{R}^{m\times m}, \forall x_{pq} \in X\in \mathbb{R}^{m \times n}:$
$$ \begin{align} \frac{\partial}{\partial x_{pq}} tr(X^TBX)&=\frac{\partial}{\partial x_{pq}} \sum_{i=1}^n (X^TBX)_{ii}\\ &=\frac{\partial}{\partial x_{pq}} \sum_{i=1}^n \sum_{j=1}^m (X^TB)_{ij}X_{ji}\\ &=\frac{\partial}{\partial x_{pq}} \sum_{i=1}^n \sum_{j=1}^m \sum_{k=1}^m X^T_{ik} B_{kj} X_{ji}\\ &=\frac{\partial}{\partial x_{pq}} \sum_{i=1}^n \sum_{j=1}^m \sum_{k=1}^m B_{kj}(x_{ki} x_{ji})\\ &=\underbrace{\frac{\partial}{\partial x_{pq}} B_{pp} x_{pq}^2}_\text{$k=j=p,i=q$}+\underbrace{\frac{\partial}{\partial x_{pq}} \sum_{j=1}^m B_{pj} x_{pq} x_{jq}}_\text{$p=k\neq j,i=q$}+\ \underbrace{\frac{\partial}{\partial x_{pq}} \sum_{k=1}^m B_{kp} x_{kq} x_{pq}}_\text{$p=j\neq k,i=q$}\\ &= 2B_{pp}x_{pq}+\sum_{j=1}^m B_{pj} x_{jq} +\sum_{k=1}^m B_{kp} x_{kq}\\ &= \sum_{j=1}^m B_{pj} x_{jq} +\sum_{k=1}^m B_{kp} x_{kq}\\ &=\sum_{j=1}^m B_{pj} X_{jq} +\sum_{k=1}^m B^T_{pk} X_{kq}\\ &=BX+B^TX, \end{align} $$as desired. $\textbf{Note that in (60)}$ term $2B_{pp}x_{pq}$ goes away because we distribute the it into 2 $B_{pp}x_{pq}|_{p=j}$ and $B_{pp}x_{pq}|_{p=k}$ that combine correspondingly with the latter 2 terms in $(59)$ to make them in full matrix multiplication form.
Let $C\in \mathbb{R}^{l \times l}$, the new $X\in \mathbb{R}^{l \times m}$ so that $CX \in \mathbb{R}^{l\times m}$, and $X^TC\in \mathbb{R}^{m \times l}$. So the new $B \in \mathbb{R}^{m \times n}$. Let $\tilde{X}=XB$ and $I \in \mathbb{R}^{l \times l}$ to be the identity matrix, we have:
$$ \begin{align} \frac{\partial}{\partial X}tr(B^TX^TCXB)&=\frac{\partial}{\partial X}tr[(XB)^TC(XB)]\\ &=\frac{\partial}{\partial X}tr(\tilde{X}^TC\tilde{X})\\ &=(C\tilde{X}+C^T\tilde{X})\frac{\partial }{\partial X} tr(\tilde{X})\\ &=(C\tilde{X}+C^T\tilde{X})\frac{\partial }{\partial X} tr(XB)\\ &=(C\tilde{X}+C^T\tilde{X})\frac{\partial }{\partial X}tr({IXB})\\ &=(C\tilde{X}+C^T\tilde{X})(I^TB^T)\\ &=(CXB+C^TXB)B^T\\ &=CXBB^T+C^TXBB^T, \end{align} $$as desired. Note that we use chain rule from Ref: The Matrix Cookbook Petersen & Pedersen, 2012, p. 15 for $(30)$, and the conclusion from the previous part in (ii) for $(33)$
Write
$$ X= \begin{bmatrix} X_1\\ \vdots\\ X_N\\ \end{bmatrix} $$From definition: $$ \begin{align} L(y_1,\cdots, y_N;w,\sigma)&=\prod_{i=1}^N F(\hat{Y}_i=y_i)|_{\hat{Y}_i(x_i;w,\sigma)\sim N(\phi(x)^Tw,\sigma ^2)}\\ &=\prod_{i=1}^N F(\hat{Y}_i=y_i)|_{\hat{Y}_i(x_i;w,\sigma)\sim N(\sum_{i=0}^M x^iw_i,\sigma ^2)}\\ &=\prod_{i=1}^N \frac{1}{\sigma \sqrt{2 \pi}} \exp{\left[-\frac{(y_i-\sum_{j=0}^M x^jw_j)^2}{2\sigma ^2}\right]}\\ &=\left(\frac{1}{\sigma \sqrt{2 \pi}}\right) ^N \prod_{i=1}^N \exp{\left[-\frac{(y_i-\sum_{j=0}^M x^jw_j)^2}{2\sigma ^2}\right]}\\ &=(\sigma^2 2\pi)^{-\frac{N}{2}} \exp{\left\{ -\frac{1}{2\sigma^2} \left[ \sum_{i=1}^N (y_i- \sum_{j=0}^M x^jw_j)^2\right]\right\}}\\ &=(\sigma^2 2\pi)^{-\frac{N}{2}} \exp{\left[ -\frac{1}{2\sigma^2} \sum_{i=1}^N (y_i- X_i^Tw)^2\right]}\\ &=(\sigma^2 2\pi)^{-\frac{N}{2}} \exp{\left\{ -\frac{1}{2\sigma^2} \sum_{i=1}^N [Y_i-(Xw)_i]^2\right\}}\\ &=(\sigma^2 2\pi)^{-\frac{N}{2}} \exp{\left[ -\frac{1}{2\sigma^2} ||Y-Xw||^2_F\right]}, \end{align} $$ as desired.